Goto

Collaborating Authors

 hybrid ternary recurrent neural network


HitNet: Hybrid Ternary Recurrent Neural Network

Neural Information Processing Systems

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model.


Reviews: HitNet: Hybrid Ternary Recurrent Neural Network

Neural Information Processing Systems

The authors study the problem of quantizing recurrent neural networks. While extreme low bit quantization (2 bits quantization) has achieved strong results for CNN, so far, such quantization performed poorly for recurrent neural network. The goal of this paper is thus to identify the reason for this observation, and to propose extreme quantization scheme better suited for RNNs. First, the authors compare different weight quantization: 2-bits uniform quantization, thresholded ternary quantization (TTQ) and Bernoulli ternary quantization (BTQ). This comparison is performed using a RNN trained on Penn TreeBank.


HitNet: Hybrid Ternary Recurrent Neural Network

Wang, Peiqi, Xie, Xinfeng, Deng, Lei, Li, Guoqi, Wang, Dongsheng, Xie, Yuan

Neural Information Processing Systems

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model.